Storage-centric Load Management for Data Streams with Update Semantics
نویسندگان
چکیده
Most data stream processing systems model their inputs as append-only sequences dfg of data elements. In this model, the application expects to receive a query answer on the complete input stream. However, there are many situations in which each data element (or a window of data elements) in the stream is in fact an update to a previous one, and therefore, the most recent arrival is all that really matters to the application. UpStream defines a storage-centric approach to efficiently processing continuous queries under such an update-based stream data model. The goal is to provide the most up-to-date answers to the application with the lowest staleness possible. To achieve this, we developed a lossy tuple storage model (called an “update queue”), which under high load, will choose to sacrifice old tuples in favor of newer ones using a number of different update key scheduling heuristics. Our techniques can correctly process queries with different types of streaming operators (including sliding windows), while efficiently handling large numbers of update keys with different update frequencies. We present a detailed analysis and experimental evidence showing the effectiveness of our algorithms using both synthetic as well as real data sets.
منابع مشابه
UpStream: Storage-centric Load Management for Data Stream Processing Systems
Processing fast updating data streams in real-time must reflect the most recent data. A number of technologies including Data Stream Management Systems have emerged to respond to this challenge. While running their queries in a continuous fashion on high-volume push-based data streams (e.g. sensor data, GPS coordinates, stock quotes), one of the most important optimization problems that these s...
متن کاملUpStream: A Storage-centric Load Management System for Real-time Update Streams
UpStream is a framework for load management over data streams with update semantics. It provides a novel storage manager architecture that can be plugged into data stream processing engines for serving streaming applications that require low-staleness results over real-time continuous queries. We propose to demonstrate the key aspects of the UpStream architecture as well as its performance usin...
متن کاملDeterministic Load-Balancing Schemes for Disk-Based Video-On-Demand Storage Servers
A video-on-demand (VOD) storage server is a parallel, storage-centric system used for playing a large number of relatively slow streams of compressed digitized video and audio concurrently. Data is read from disks in relatively large chunks, and is then “streamed” out onto a distribution network. The primary design goal is to maximize the ratio of the number of concurrent streams to system cost...
متن کاملDesign and Evaluation of an Autonomous Load Balancing System for Mobile Data Stream Processing Based On a Data Centric Publish Subscribe Approach
Several new applications of mobile computing environments, such as Intelligent Transportation Systems, Fleet Management and Logistics, and integrated Industrial Process Automation share the requirement of remote monitoring and high performance processing of huge data streams produced by large sets of mobile nodes. Two key requirements for the deployment and operation of such mobile infrastructu...
متن کاملA Quality-Centric Data Model for Distributed Stream Management Systems
It is challenging for large-scale stream management systems to return always perfect results when processing data streams originating from distributed sources. Data sources and intermediate processing nodes may fail during the lifetime of a stream query. In addition, individual nodes may become overloaded due to processing demands. In practice, users have to accept incomplete or inaccurate quer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009